168
Applications in Computer Vision
BN
Conv.
PReLU
BN
BN
+1, +1, +1
+1, +1, +1
+1, +1, +1
-1, -1, -1
-1, -1, -1
-1, -1, -1
ߚଵଵ
భ, ߚଵଶ
భ, ߚଵଷ
భ
ߚଶଵ
భ, ߚଶଶ
భ, ߚଶଷ
భ
ߚଷଵ
భ, ߚଷଶ
భ, ߚଷଷ
భ
۩
ܮ
PReLU
BN
ܮ
Differentiable Binarization Search
Learning scale factor
ߚଵଵ
మ, ߚଵଶ
మ, ߚଵଷ
୭మ
ߚଶଵ
మ, ߚଶଶ
మ, ߚଶଷ
మ
ߚଷଵ
మ, ߚଷଶ
మ, ߚଷଷ
మ
ࢻ
ܟ
܉ିଵ
ܟ
ො܉ିଵ
ෝܟ
ି
ෝܟ
ା
ߚ
భ
ߚ
మ
Real-valued Teacher
1-bit Student
FIGURE 6.10
Our LWS-Det. From left to right are the input, search, and learning processes. For a given 1-
bit convolution layer, LWS-Det first searches for the binary weight (+1 or −1) by minimizing
the angular loss supervised by a real-valued teacher detector. LWS-Det learns the real-valued
scale factor α to enhance the feature representation ability.
where ⊗is the convolution operation. We omit the batch normalization (BN) and activation
layers for simplicity. The 1-bit model aims to quantize wi and ai into wi ∈{−1, +1}
and ai ∈{−1, +1} using efficient xnor and bit-count operations to replace full-precision
operations. Following [99], the forward process of the 1-bit CNN is:
ai = sign(ai−1 ⊙wi),
(6.66)
where ⊙represents the xnor and bit-count operations and sign(·) denotes the sign function,
which returns 1 if the input is greater than zero and −1 otherwise. This binarization process
will bring about the binarization error, which can be seen in Figs. 6.11 (a) and (b). The
product of the 1-bit convolution (b) cannot simulate the one of real value (a) both in
angularity and in amplitude.
Substantial efforts have been made to optimize this error. [199, 228] formulate the object
as
Lw
i = ∥wi −αi ◦wi∥2
2,
(6.67)
where ◦denotes the channel-wise multiplication and αi is the vector consisting of channel-
wise scale factors. Figure 6.11 (c) [199, 228] learns αi by directing optimizing Lw
i to 0, and
thus the explicit solution is
αj
i =
∥wj
i ∥1
Ci−1 · Kj
i · Kj
i
,
(6.68)
where j denotes the j-th channel of i-th layer. Other works [77] dynamically evaluate Eq.
6.80 rather than explicitly solving or modifying αi to other shapes [26].
Previous work mainly focuses on kernel reconstruction but neglects angular information,
as shown in Fig. 6.11 (d). One drawback of existing methods lies in its ineffectiveness when
binarizing a very small float value as shown in Fig. 6.11. On the contrary, we leverage
the strong capacity of a differentiable search to fully explore a binary space for an ideal
combination of −1 and +1 without a ambiguous binarization process involved.
6.4.2
Formulation of LWS-Det
We regard the 1-bit object detector as a student network, which can be searched and learned
based on a teacher network (real-valued detector) layer by layer. Our overall framework is